Contracted Suffix Trees: A Simple and Dynamic Text Indexing Data Structure

نویسندگان

  • Andrzej Ehrenfeucht
  • Ross M. McConnell
  • Sung-Whan Woo
چکیده

We address the problem of finding the locations of all instances of a string P in a text T , where of T is allowed to facilitate the queries. Previous data structures for this problem include the suffix tree, the suffix array, and the compact DAWG. We modify a data structure called a sequence tree, which was proposed by Coffman and Eve for hashing, and adapt it to the new problem. We can then produce a list of k occurrences of any string P in T in O(||P || + k) time. Because of properties shared by suffixes of a text that are not shared by arbitrary hash keys, we can build the structure in O(||T ||) time, which is much faster than Coffman and Eve’s algorithm. These bounds are as good as those for the suffix tree, suffix array, and the compact DAWG. The advantages are the elementary nature of some of the algorithms for constructing and using the data structure and the asymptotic bounds we can give for updating the data structure when the text is edited.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inferring Strings from Suffix Trees and Links on a Binary Alphabet

A suffix tree, which provides us with a linear space full-text index of a given string, is a fundamental data structure for string processing and information retrieval. In this paper we consider the reverse engineering problem on suffix trees: Given an unlabeled ordered rooted tree T accompanied with a node-to-node transition function f , infer a string whose suffix tree and its suffix links fo...

متن کامل

Sparse Directed Acyclic Word Graphs

The suffix tree of string w is a text indexing structure that represents all suffixes ofw. A sparse suffix tree ofw represents only a subset of suffixes of w. An application to sparse suffix trees is composite pattern discovery from biological sequences. In this paper, we introduce a new data structure named sparse directed acyclic word graphs (SDAWGs), which are a sparse text indexing version ...

متن کامل

Position heaps: A simple and dynamic text indexing data structure

We address the problem of finding the locations of all instances of a string P in a text T , where preprocessing of T is allowed in order to facilitate the queries. Previous data structures for this problem include the suffix tree, the suffix array, and the compact DAWG. We modify a data structure called a sequence tree, which was proposed by Coffman and Eve for hashing [1], and adapt it to the...

متن کامل

Fully-online Construction of Suffix Trees for Multiple Texts

We consider fully-online construction of indexing data structures for multiple texts. Let T = {T1, . . . , TK} be a collection of texts. By fully-online, we mean that a new character can be appended to any text in T at any time. This is a natural generalization of semi-online construction of indexing data structures for multiple texts in which, after a new character is appended to the kth text ...

متن کامل

Range Median of Minima Queries, Super-Cartesian Trees, and Text Indexing

A Range Minimum Query asks for the position of a minimal element between two specified array-indices. We consider a natural extension of this, where our further constraint is that if the minimum in a query interval is not unique, then the query should return an approximation of the median position among all positions that attain this minimum. We present a succinct preprocessing scheme using onl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009